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DETAILED ACTION 

1 . Applicant has amended claims land 10 in the amendment filed on 
06/05/2008. Claims 6 and 1 5 are canceled. Claims 1 , 3-5, 8-1 0, 1 2-1 4, 1 7 and 1 8 
are pending in this Office Action. 

2. A request for continued examination under 37 CFR 1.114, including the 
fee set forth in 37 CFR 1 .17(e), was filed in this application after final rejection. 
Since this application is eligible for continued examination under 37 CFR 1.114, 
and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the 
previous Office action has been withdrawn pursuant to 37 CFR 1 .1 14. 
Applicant's submission filed on 06/05/2008 has been entered. 

Response to Arguments 

3. Applicant's arguments with respect to claims 1 , 3-5, 8-1 0, 1 2-1 4, 1 7 and 
18 have been considered but are moot in view of the new ground(s) of rejection. 

Applicant argued that, Agrawak neither teaches nor suggests repeating 
the large classification outputting in association with new DT matrix generation 
for each cluster. Examiner respectfully disagrees. 

In response to Applicant's argument, Agrawak teaches a system for 
organizing a large text database into a hierarchy of topics and for maintaining this 
organization as documents are added and deleted and as the topic hierarchy 
changes (abstract). Large sub-trees in the topic tree can be eliminated forthwith if 
the score of the root of those sub-trees are very poor. Text database population 
is not the only application of fast multi-level classification. With increasing 
connectivity, it will be inevitable that some searches will go out to remote text 
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servers and retrieve results that must then be classified in real time (page 8, 
paragraph 0131). The user restricts the topical context using a suitable selection 
on the taxonomy. Then a plurality of relevant documents which also adhere to 
the topic restrictions is retrieved. In a preferred embodiment, each document in 
the database has been pre-classified. The user is presented with a suitable 
display of those portions of the taxonomy where relevant documents were found. 
The user may then enter a command through the user input device to cause the 
system to select at least one of the displayed sub-topics. This process is 
repeated as necessary to refine the query topic until the user's information need 
is satisfied (page 5, paragraph 0084). 

Applicant argued that, Tokuda does not explicitly teach the "term list 
edition module" (original claim 6) that is herein added to claim 1 . Examiner 
respectfully disagrees. 

In response to Applicant's argument, Bent teaches an initial document by 
term matrix is formed, each document being represented by a respective M 
dimensional vector, where M represents the number of terms or words in a 
predetermined domain of documents. The techniques of text mining currently 
include the automatic indexing of documents, extraction of key words and terms, 
grouping/clustering of similar documents, categorising of documents into pre- 
defined categories and document summarization (page 1, paragraph 0010- 
0011). 

TextFormatter reads both the textual document in the document set and 
the term list generated (page 4, paragraph 0060; see also element 305 of figure 
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3). The text from the document is read in and tokenised into sentences. 
Sentences again are tokenised into words. Now the sentences have to be 
checked for terms that have an entry in the hashtable. Since it is possible that 
words which are part of a composed term occur as single words as well, it is 
necessary to check a sentence backwards. That is, firstly the hashtable is 
searched for a test string which consists of the whole sentence. When no valid 
entry is found one word is removed from the end of the test string and the 
hashtable is searched again. This is repeated until either a valid entry was found 
or only a single word remains (page 4, paragraph 0060-0069). 

Claim Rejections - 35 USC § 101 

4. Claims 1, 3-5 and 8-9 are rejected under 35 U.S. C. 101 because the 
language of the claim raises a question as to whether the claim is directed 
merely to an abstract idea that is not tied to a technological art, environment or 
machine which would result in a practice application producing a concrete, 
useful, and tangible result to form the basis of statutory subject matter under 35 
U.S.C 101. 

The claims 1, 3-5 and 8-9 lack the necessary physical articles or objects to 
constitute a machine or a manufacture within the meaning of 35 USC 101 . They 
are clearly not a series of steps or act to be a process nor are they a combination 
of chemical compounds to be a composition of matter. As such, they fail to fall 
within a statutory category. They are, at best, functional descriptive material per 
se. 
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Descriptive material can be characterized as either "functional descriptive 
material" or "nonfunctional descriptive material." Both types of "descriptive 
material" are nonstatutory when claimed as descriptive material perse, 33 F.3d 
at 1360, 31 USPQ2d at 1759. When functional descriptive material is recorded 
on some computer-readable medium, it becomes structurally and functionally 
interrelated to the medium and will be statutory in most cases since use of 
technology permits the function of the descriptive material to be realized. 
Compare In re Lowry, 32 F.3d 1579, 1583-84, 32 USPQ2d 1031, 1035 (Fed. Cir. 
1994) 

Merely claiming non functional descriptive material, i.e., abstract ideas 
stored on a computer-readable medium, in a computer, or on an electromagnetic 
carrier signal, does not make it statutory. See Diehr, 450 U.S. at 185-86, 209 
USPQ at 8 (noting that the claims for an algorithm in Benson were unpatentable 
as abstract ideas because "[t]he sole practical application of the algorithm was in 
connection with the programming of a general purpose computer."). 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness 
rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as 
set forth in section 102 of this title, if the differences between the subject matter sought to be 
patented and the prior art are such that the subject matter as a whole would have been obvious 
at the time the invention was made to a person having ordinary skill in the art to which said 
subject matter pertains. Patentability shall not be negatived by the manner in which the invention 
was made. 

This application currently names joint inventors. In considering 
patentability of the claims under 35 U.S.C. 1 03(a), the examiner presumes that 
the subject matter of the various claims was commonly owned at the time any 
inventions covered therein were made absent any evidence to the contrary. 
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Applicant is advised of the obligation under 37 CFR 1 .56 to point out the inventor 
and invention dates of each claim that was not commonly owned at the time a 
later invention was made in order for the examiner to consider the applicability of 
35 U.S.C. 103(c) and potential 35 U.S.C. 102(e), (f) or (g) prior art under 35 
U.S.C. 103(a). 

5. Claims 1, 3-5, 8-10, 12-14 and 17-18 are rejected under 35 U.S.C. 103(a) 
as being unpatentable over Tokuda et al. (US Patent No. 7,024,400 B2, 
hereinafter "Tokuda") in view of Glover (US Patent Application No. 2003/0221 163 
A1 , hereinafter "Glover"), Agrawal et al. (US Patent Application No. 
2001/0037324 A1 , hereinafter "Agrawal") and Bent et al. (US Patent Application 
No. 2004/0205457 A1 , hereinafter "Bent"). 

As to claims 1 and 10, Tokuda teaches the claimed limitations: 

"A sentence classification device characterized" as document 
classification is important not only in office document processing but also in 
implementing an efficient information retrieval system (column 1, lines 13-15). 

"A term list having a plurality of terms each comprising not less than one 
word" as a term is defined as a word or a phrase that appears in at least two 
documents (column 4, lines 5-6). 

"DT matrix generation module for generating a DT matrix two- 
dimensionally expressing a relationship between each document contained in a 
document set and said each term" as the term by document matrix of the original 
documents (column 9, lines 23; see also table 1). 

"DT matrix transformation module for generating a transformed DT matrix 
having clusters having blocks of associated documents by transforming the DT 
matrix obtained by said DT matrix generation module on the basis of a DM 
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decomposition method" as exploiting the singular vector decomposition method, 
the major left singular vectors associated with the largest singular values are 
selected as a major vector space called an intra-DLSI space, or an l-DLSI space 
(column 3, lines 2-5). The extra-DLSI space or the E-DLSI space can similarly be 
obtained by setting up a differential term by extra-document matrix where each 
column of the matrix denotes a differential document vector between the 
document vector and the centroid vector of the cluster, which does not include 
the document. The extra-DLSI space may then be constructed by the major left 
singular vectors associated with the largest singular values (column 3, lines 18- 
25). 

"Classification generation module for generating classifications associated 
with the document set on the basis of a relationship between each cluster on the 
transformed DT matrix obtained by said DT matrix transformation module and 
said each document classified according to the clusters" as given a new 
document to be classified, a best candidate cluster to be recalled from the 
clusters can be selected from among those clusters having the highest 
probabilities of being the given differential intra-document vector (column 3, lines 
10-13). 

The differences in word usage between the document and a cluster's 
centroid vector, the differential document vector is capable of capturing the 
relation between the particular document and the cluster (Column 2, lines 41-46). 

Tokuda does not explicitly teach the claimed limitation "wherein the 
classification generation module comprises a virtual representative document 
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generation module for generating a virtual representative document, for each 
cluster on a transformed DT matrix, from a term of each document belonging to 
the cluster; and generating and outputting an index indicating validity of the 
edition from the DT matrices". 

Glover teaches using a virtual document comprising extended anchortext 
to determine whether a web page is to be classified into a given category (page 
2, paragraph 0013). 

Generating a classification output of the target web page utilizing a trained 
full-text classifier; and combining the classification output of the virtual document 
classifier and the classification output of the full-text classifier to generate a 
combined classification output for the target web page (page 2, paragraph 0018). 

The web page downloader may easily be replaced by a data cache or an 
index, which can easily provide the text for the target web page without having to 
download the target web page. The full-text classifier, after being trained using 
web page documents, determines a classification output. The full-text classifier 
comprises a learning algorithm, which is trained as described below to produce a 
prediction rule, which after the full-text classifier is trained actually evaluates the 
target web page to predict whether the target web page is a member of a positive 
set (page 4, paragraph 0032). 

Tokuda does not explicitly teach the claimed limitation "large classification 
generation module for generating a large classification of documents from each 
document in a bottom-up manner by repeatedly performing hierarchical 
clustering processing of setting a DT matrix generated by said DT matrix 
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generation module in an initial state, causing said virtual representative 
document generation module to generate a virtual representative document for 
each cluster on a transformed DT matrix generated from the DT matrix by said 
DT matrix transformation module, generating a new DT matrix used for next 
hierarchical clustering processing by adding the virtual representative document 
to the transformed DT matrix and deleting documents belonging to the cluster of 
the virtual representative document from the transformed DT matrix, and 
outputting, for said each cluster, information associated with the documents 
constituting the cluster as large classification data". 

Agrawal teaches for organizing a large text database into a hierarchy of 
topics and for maintaining this organization as documents are added and deleted 
and as the topic hierarchy changes (abstract). For such classifiers, feature sets 
larger than 100 are considered extremely large. Document classification may 
require more than 50,000. Singular value decomposition on the term-document 
matrix has been found to cluster semantically related documents together even if 
they do not share keywords (page 2, paragraph 0019-0021). 

The feature set changes by context as the classification process proceeds 
down the taxonomy. As a result, jargon common to lower nodes of the taxonomy 
are filtered out and the classification accuracy remains high in spite of the 
reduction in the number of terms and candidate classes inspected (page 3, 
paragraph 0029). 

Each document in the database has been pre-classified. The user may 
then enter a command through the user input device to cause the system to 
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select at least one of the displayed sub-topics. This process is repeated as 
necessary to refine the query topic until the user's information need is satisfied 
(page 5, paragraph 0084). 

A parent class inherits, in an additive fashion, the statistics of its children, 
since each training document generates rows for each topic node from the 
assigned topic up to the root (page 13, paragraph 0204). 

Although Tokuda teaches preprocessing documents using said computer 
to distinguish terms of a word and a noun phrase from stop words; constructing 
system terms by setting up a term list as well as global weights using said 
computer (claim 1). The method includes the setting up of a differential latent 
semantics index (DLSI) space-based classifier to be stored in computer storage 
and the use of such classifier by a computer to evaluate the possibility of a 
document belonging to a given cluster using a posteriori probability function 
(abstract). 

Tokuda does not explicitly teach the claimed limitation "term list edition 
module for adding or deleting an arbitrary term with respect to the term list; and 
index generation module for making said DT matrix generation module generate 
DT matrices by using term lists before and after edition by said term list edition 
module". 

Bent teaches an initial document by term matrix is formed, each document 
being represented by a respective M dimensional vector, where M represents the 
number of terms or words in a predetermined domain of documents. The 
techniques of text mining currently include the automatic indexing of documents, 
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extraction of key words and terms, grouping/clustering of similar documents, 
categorising of documents into pre-defined categories and document 
summarization (page 1, paragraph 0010-0011). 

TextFormatter reads both the textual document in the document set and 
the term list generated (page 4, paragraph 0060; see also element 305 of figure 
3). The text from the document is read in and tokenised into sentences. 
Sentences again are tokenised into words. Now the sentences have to be 
checked for terms that have an entry in the hashtable. Since it is possible that 
words which are part of a composed term occur as single words as well, it is 
necessary to check a sentence backwards. That is, firstly the hashtable is 
searched for a test string which consists of the whole sentence. When no valid 
entry is found one word is removed from the end of the test string and the 
hashtable is searched again. This is repeated until either a valid entry was found 
or only a single word remains. 

To be admitted as a column of the term-sentence matrix, a term must 
occur in the sentences of the document set more often than a minimum 
frequency, whereby a user or administrator may determine the minimum 
frequency. For instance, it is illogical to add terms to the matrix that occur only 
once, as the objective is to find clusters of sentences which have terms in 
common. Next, the document vector is searched for all occurrences of term #1 of 
the term vector. If the term occurs at least as often as the specified minimum 
frequency, it remains in the term vector and if the term occurs less often, it is 
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removed. Since actor occurs only once in the document vector, the term is 
deleted from the head of the term vector (page 4, paragraph 0060-0069). 

Therefore, it would have been obvious to one of ordinary skill in the art at 
the time the invention was made, having the teachings of Tokuda, Glover, 
Agrawal and Bent before him/her, to modify Tokuda for adding or deleting an 
arbitrary term with respect to the term list because that would provide a 
technique that discovers topics from within a collection of electronically stored 
documents and automatically extracts and summarises topics as taught by Bent 
(page 1, paragraph 0013). 

As to claims 3 and 12, Tokuda teaches the claimed limitations: 
"Characterized by further comprising label generation module for 
outputting each term strongly connected to each document belonging to said 
arbitrary cluster as a label indicating a classification of the cluster" as a new 
efficient supervised document classification procedure introduced, whereby 
learning from a given number of labeled documents preclassified into a finite 
number of appropriate clusters in the database, the classifier developed will 
select and classify any of new documents introduced into an appropriate cluster 
within the classification stage (column 2, lines 21-25). 

As to claims 4 and 13, although Tokuda teaches the extra-DLSI space, or 
the E-DLSI space can similarly be obtained by setting up a differential term by 
extra-document matrix where each column of the matrix denotes a differential 



Application/Control Number: 1 0/563,31 1 Page 
Art Unit: 2162 

document vector between the document vector and the centroid vector of the 
cluster which does not include the document (column 3, lines 18-23). 

Tokuda does not explicitly teach the claimed limitation "Characterized by 
further comprising document organization module for sequentially outputting 
documents belonging to said arbitrary cluster or all documents in an arrangement 
order of the documents in the transformed DT matrix". 

Agrawal teaches given k*(c), the sorted Fisher table is scanned while 
copying the first k*(c) rows for the run corresponding to class c to an output table 
and discarding the remaining terms. This involves completely sequential IO 
(page 12, paragraph 0187). 

Therefore, it would have been obvious to one of ordinary skill in the art at 
the time the invention was made, having the teachings of Tokuda, Glover, 
Agrawal and Bent before him/her, to modify Tokuda the document organization 
because that would improve the document search performance include speed 
and accuracy as taught by Agrawal (page 14, paragraph 0216). 

As to claims 5 and 14, Tokuda teaches the claimed limitations: 
"Characterized by further comprising summary generation module for 
outputting, as a summary of said arbitrary document, a sentence of sentences 
constituting the document which contains a term strongly connected to the 
document" as the setting up of a DLSI space-based classifier is summarized. 
Documents are preprocessed, to identify and distinguish terms, either of the word 
or noun phrase, from stop words. System terms are then constructed, by setting 
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up the term list as well as the global weights. The process continues with 
normalization of the document vectors, of all the collected documents, as well as 
the centroid vectors of each cluster. Following document vector normalization, 
the differential term by document matrices may be constructed by intra-document 
or extra-document construction (column 7, lines 24-34). 

As to claims 8 and 17 

Tokuda does not explicitly teach the claimed limitation "characterized in that 
said large classification generation module terminates repetition of the clustering 
processing when no cluster is obtained from the transformed DT matrix in the 
clustering processing". 

Agrawal teaches each of the other second level topics may be divided at 
the third level to further topics. Also, in a similar fashion, further levels under the 
third level may be included in the topic hierarchy, or taxonomy. The final level of 
each path in the taxonomy terminates at a terminal or leaf node (page 6, 
paragraph 0087). Large sub-trees in the topic tree can be eliminated forthwith if 
the score of the root of those sub-trees are very poor (page 8, paragraph 0131). 

Therefore, it would have been obvious to one of ordinary skill in the art at 
the time the invention was made, having the teachings of Tokuda, Glover, 
Agrawal and Bent before him/her, to modify Tokuda terminates repetition of the 
clustering processing because that would provide a means for designing vastly 
enhanced searching, browsing and filtering systems as taught by Agrawal (page 
1 , paragraph 0009). 
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As to claims 9 and 18, Tokuda teaches the claimed limitations: 
"Characterized by further comprising large classification label generation 
module for, if a virtual representative document is contained in a given cluster of 
clusters obtained by the clustering processing" as a new efficient supervised 
document classification procedure, whereby learning from a given number of 
labeled documents preclassified into a finite number of appropriate clusters in the 
database, the classifier developed will select and classify any of new documents 
introduced into an appropriate cluster within the classification stage (column 2, 
lines 22-28). 

Tokuda does not explicitly teach the claimed limitation "generating a label 
of the cluster on which the virtual representative document is based from a term 
strongly connected to the virtual representative document". 

Glover teaches the virtual document classifier comprises the learning 
algorithm (not shown) that accepts as input a set of labeled input virtual 
documents. From the labeled input virtual documents the learning algorithm 
generates a prediction rule. After the virtual document classifier 106 is trained, a 
new unlabeled virtual document can be evaluated by the prediction rule to predict 
its label (page 4, paragraph 0031 ). 

Also, Agrawal teaches that with reference to the hierarchy represented, 
statistics are calculated for the science node, based on the terms in all of the 
documents from the collection set that are classified in classes represented by 
nodes below the science node. Including the nodes labeled biology, chemistry, 
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electronics, and all children nodes of those nodes (page 6, paragraph 0093). 
Large sub-trees in the topic tree can be eliminated forthwith if the score of the 
root of those sub-trees are very poor. Text database population is not the only 
application of fast multi-level classification. With increasing connectivity, it will be 
inevitable that some searches will go out to remote text servers and retrieve 
results that must then be classified in real time (page 8, paragraph 0131). 

Therefore, it would have been obvious to one of ordinary skill in the art at 
the time the invention was made, having the teachings of Tokuda, Glover, 
Agrawal and Bent before him/her, to modify Tokuda strongly connected to the 
virtual representative document because that would provide a system which is 
sufficiently fast as taught by Agrawal (page 2, paragraph 0025). 

Contact Information 

Any inquiry concerning this communication or earlier communications from 
the examiner should be directed to James Hwa whose telephone number is 571- 
270-1285. The examiner can normally be reached on 8:00 - 5:00. If attempts to 
reach the examiner by telephone are unsuccessful, the examiner's supervisor, 
Don Wong can be reached on 571-272-1834. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from 
the Patent Application Information Retrieval (PAIR) system. Status information 
for published applications may be obtained from either Private PAIR or Public 
PAIR. Status information for unpublished applications is available through Private 
PAIR only, for more information about the PAIR system, see http://pair- 
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direct.uspto.gov . Should you have questions on access to the PAIR system 

contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you 

would like assistance from a USPTO Customer Service Representative or 

access to the automated information system, call 800-786-9199 (IN USA OR 

CANADA) or 571-272-1000. 
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